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ABSTRACT 


The  use  of  a  Continuous  Voice  Recognition  System  for  data  input  to  Tactical 
Table  in  the  Combat  Information  Center  would  improve  the  man-machine 
interface  and  decrease  the  reaction  time  of  operators  who  run  the  tables.  The 
results  of  this  study  show  that  the  delay  times  of  trained  personnel  using  manual 
typing  input  methods  were  far  greater  than  when  they  used  continuous  speech 
input  to  run  two  tactical  tables.  Using  a  Verbex  Series  5000  Version  3.00 
continuous  speech  recognition  system,  the  operators’  reaction  times  were  improved 
by  a  factor  of  3.3  and  at  the  same  time  they  committed  fewer  data  entry  errors 
when  running  the  tables  with  speech  input.  The  subjects  who  participated  in  the 
experiments  also  subjectively  reported  that  the  freedom  allowed  by  speech  input 
was  an  improvement  over  manual  typing  input  methods.  Using  speech  input,  one 
operator  could  run  two  tactical  tables,  where  now  it  takes  two  to  three  people  to  do 
the  same  job. 


TABLE  OF  CONTENTS 


I.  EXECUTIVE  SUMMARY  .  1 

3 

4 

4 

5 

7 

7 

8 
9 

10 


III.  OBJECTIVE .  13 

♦ 

IV.  DESCRIPTION  OF  THE  SIMULATION .  14 

A.  PROCEDURE  FOR  THIS  EXPERIMENT .  16 

V.  RESEARCH  SCENARIO  .  20 

VI.  EXPERIMENTAL  DESIGN  .  27 


iv 


II.  INTRODUCTION  . 

A.  SPEECH  RECOGNITION  TECHNOLOGY  . 

1 .  Background  . 

2 .  Types  of  Speech  Systems  . 

3.  Voice  Recognition,  Verification,  and 

Identification  .  . . 

B.  VERBEX  SERIES  5000  VERSION  3.00  VOICE 

RECOGNIZER  . 

1.  Recognizer  File  . 

2.  Voice  File  . 

C.  DESCRIPTION  OF  THE  PROBLEM  . 


VII.  DEPENDENT  VARIABLES 


30 


VIII.  RESULTS .  32 

A.  RESULTS  FOR  SCENARIO  TIMES  .  32 

B.  RESULTS  FOR  ERRORS .  33 

C.  TIMING  RESULTS  ON  INDIVIDUAL  UTTERANCES  ....  33 

D.  SUBJECTIVE  QUESTIONNAIRE  RESULTS  .  39 

IX.  OTHER  OBSERVATIONS  .  40 

X.  CONCLUSIONS .  43 

APPENDIX  A .  44 

APPENDIX  B .  45 

LIST  OF  REFERENCES .  48 

INITIAL  DISTRIBUTION  LIST  .  49 


v 


LIST  OF  TABLES 


Table 

Table 


I.  Times  for  Oral  and  Manual  Input  Method.  .  . 

II.  Timing  results  on  individual  utterances.  . 


32 
.  35 

4 


vi 


LIST  OF  FIGURES 


Figure  1:  Top  View  of  the  C.I.C .  15 

Figure  2:  Configuration  of  the  Experimental  Simulation.  17 

Figure  3:  Flow  Chart .  19 

Figure  4:  P.P.I.  Display  of  the  scenario  used .  22 


Figure  5 :  Errors  Input  to  the  system  with  Oral  and  Manual 

Mode . . .  34 


vii 


I.  EXECUTIVE  SUMMARY 


This  study  describes  an  experiment  in  which  military 
officers  used  a  Continuous  Voice  Recognition  System,  VERBEX 
Series  5000  Version  3.00,  as  a  data  input  method  for  Tactical 
Tables . 

The  objective  was  to  prove  that  when  using  normal  typing 
input,  delay  times  of  even  the  most  trained  personnel  are 
greater  than  those  of  a  person  using  a  Continuous  Voice 
Recognition  System  as  an  input  method  for  Tactical  Tables. 

Ten  military  officers  were  introduced  to  the  voice 
equipment  for  the  first  time,  trained  on  the  system  by 
creating  their  own  voice  patterns  and  then  run  the  scenario 
which  was  given  to  them. 

In  the  experiment,  subjects  followed  a  fixed  scenario  of 
instructions  which  they  performed  one  time  with  voice  input, 
and  then  a  second  time  with  manual  input. 

The  scenario  was  designed  to  take  about  six  minutes  to 
perform  with  normal  manual  input. 

Most  of  the  subjects  (eight  out  of  ten)  had  some 
familiarity  with  other  Voice  Recognition  Systems.  All  subjects 
were  introduced  to  the  VERBEX  Series  5000  Version  3.00  System 
and  instructed  on  what  they  would  be  doing. 

The  results  show: 

1)  Voice  Input  was  3.3  times  faster  than  the  manual  typing 
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input . 

2)  Manual  Typing  Input  had  5.69  times  more  entry  errors. 

3)  Voice  Input  did  not  give  any  erroneous  reply,  and  in 
case  of  doubt  did  not  reply  at  all,  in  this  way  requiring  the 
user  to  repeat  his  utterance. 

We  have  observed  here  that  with  minimal  practice  and 
minimal  experience  with  the  system,  the  job  was  done  faster, 
and  with  fewer  errors . 
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II.  INTRODUCTION 


It  is  well  known  that  today  Naval  Operations  are  done  with 
the  use  of  specially  trained  personnel,  due  to  the  complexity 
of  the  existing  systems  such  as  Tactical  Tables  to  support 
advanced  weapon  systems. 

The  situation  in  the  Persian  Gulf,  caused  by  the  invasion 
of  Kuwait  by  Iraq,  proved  the  importance  and  the  need  to 
minimize  operator  reaction  time  to  obtain  the  tactical 
advantage  over  the  enemy.  As  an  example  to  this  we  can  mention 
the  SKUD  missiles  launched  by  Iraq  against  Israel,  and  the 
inability  of  the  last  ones  to  intercept  them  even  though  they 
possessed  very  sophisticated  weapons  like  the  "Patriot"  anti¬ 
missiles.  They  could  have  been  more  effective  if  they  had  been 
able  to  reduce  their  reaction  time  with  the  use  of  a 
Continuous  Voice  Recognition  System. 

Even  the  most  trained  personnel  cannot  eliminate  the  delay 
times  caused  by  the  "middle  man",  (a  possible  source  of 
misinterpretation) .  This  guides  us  to  the  use  of  "real  time" 
systems  offered  by  the  new  technology,  such  as  Continuous 
Voice  Recognition  Systems. 

It  has  been  acknowledged  that  speech  is  a  human  being' s 
most  effective  and  therefore  most  comfortable  means  of 
communicating  (Strategic  Computing  Program,  Integration, 
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Transition  and  Performance  Evaluation  of  Speech  Technology, 
1985) . [Ref.  1] 

Computers  which  can  operate  via  voice  commands  may  be 
logical  alternatives  as  input  devices  eliminating  most 
possible  sources  of  misinterpretations,  the  "middle  man". 

As  LeFever  states, "Increased  use  of  computers  in  problem¬ 
solving  will  demand  more  emphasis  on  man-machine  interfaces. 
Speech  Recognition  will  be  that  interface  which  makes  the 
computer  a  true  extension  of  man  (LeFever,  1987)." 
[Ref.  2] 

A.  SPEECH  RECOGNITION  TECHNOLOGY 

1 .  Background 

The  original  development  of  speech  Input /Output  (I/O) 
technology  can  probably  be  traced  to  the  early  1950fs  and 
1960's  when  many  of  the  larger  technical  companies  were  doing 
basic  research  on  trying  to  recognize  spoken  digits. 

A  few  of  the  companies  involved  in  those  days  were 
IBM,  Bell  Telephone  Laboratories,  RCA,  Philco-Ford  and  others. 
As  events  unfolded,  the  first  commercially  available  products 
came  on  the  market  in  the  early  70,  s  with  the  Speech 
Recognition  Systems  offered  by  Threshold  Technology,  Inc.  and 
Scope  Electronics,  Inc. 

Between  1971-1976,  the  Advanced  Research  Projects 
Agency  (ARPA)  funded  a  large  $15  million  research  effort.  The 
goal  of  this  effort  was  to  produce  systems  which  could 
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interpret  or  "understand"  vocabularies  of  1000  words  when  used 
in  continuously  spoken  sentences  or  phrases. 

A  variety  of  industrial  companies,  academic 
institutions,  and  institutes  worked  on  what  was  known  as  the 
ARPA  Speech  Understanding  Research  (SUR)  project.  The  1978- 
1980  period  saw  a  good  variety  of  Speech  Recognizers  become 
commercially  available  for  a  few  hundred  dolxars  up  to  $20,000 
and  more.  Most  products  in  that  era  were  isolated  word 
(utterance)  type  systems  versus  connected  speech  systems  which 
have  become  more  prevalent  in  the  1980' s.  An  utterance  is  one 
or  more  words  in  a  phrase  (Poock, 1984) . [Ref .  3] 

2 .  Types  of  Speech  Systems 

There  are  four  generic  types  of  Speech  Recognition 
Systems.  One  delineation  is  that  a  recognition  system  is 
either  speaker  dependent  or  speaker  independent. 

If  a  system  is  speaker  dependent,  then  it  would 
require  samples  of  the  potential  user' s  voice  to  be  in  memory 
in  order  to  work  properly.  A  speaker  dependent  system  is 
basically  tuned  for  the  user's  voice  and  should  work  better 
than  a  speaker  independent  because  the  first  one  contains 
samples  of  the  actual  user's  voice. 

A  speaker  independent  system  contains  algorithms  which 
supposedly  can  handle  many  different  voices  and  dialects,  le 
system  should  be  able  to  recognize  the  voice  of  anyone  who 
tries  to  use  it.  Thus  it  requires  no  previous  samples  of  a 


given  user's  voice  but  rather,  contains  an  algorithm  which  is 
robust  enough  to  recognize  anyone  who  talks  to  it .  Both 
systems  today  achieve  accuracies  in  the  97-99%  range  for 
vocabularies  of  several  hundred  words  (utterances)  in  a  normal 
office  type  environment. 

The  other  generic  breakdown  of  Speech  Recognizers  is 
whether  they  are  of  the  discrete  isolated  word  type  or  if  they 
are  a  connected  (continuous)  Speech  System.  Either  one  could 
be  dependent  or  independent . 

In  a  discrete  system,  the  user  has  a  given  number  of 
sound  patterns  in  memory .  A  sound  pattern  can  be  one  or 
several  words  in  a  continuous  phrase  of  sounds .  When  using  the 
discrete  system,  the  user  must  pause  about  .10  seconds  between 
each  utterance  he/she  makes. 

When  the  Recognizer  "hears"  the  pause,  it  knows  that 
was  the  end  of  an  utterance  and  therefore  starts  to  search  in 
the  memory  for  what  was  just  said. 

Connected  Speech,  on  the  other  hand,  requires  no 
pauses  between  utterances  or  phrases. 

A  slight  distinction  is  made  by  some  in  the  speech 
community  between  Continuous  Speech  versus  Connected  Speech. 

Connected  Speech,  allows  the  user  to  speak  in  a 
natural  manner  while  the  computer  stores  the  spoken  words  in 
a  buffer  memory.  When  the  speaker  pauses  for  a  breath  or 
between  phrases,  the  information  in  the  buffer  is  presented  to 
the  processors  for  recognition  and  appropriate  action. 
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Continuous  Speech  also  allows  the  speaker  to  talk  in 
a  natural  manner;  the  fundamental  difference  is  that  the 
system  continually  recognizes  what  is  being  said  and  responds 
accordingly.  Continuous  Systems  are  most  natural  and 
comfortable  for  the  user  in  an  interactive  environment,  but 
they  require  more  advanced  technology  (Poock, 1984) . [Ref .  3] 
3.  Voice  Recognition,  Verification,  and  Identification 
Voice  Recognition  means  that  we  are  simply  trying  to 
recognize  the  pattern  of  sound  that  was  uttered. 

Voice  Verification  means  that  the  user  identifies 
himself  by  some  mean  like  a  Personal  Identification  Number 
(PIN)  or  some  similar  technique,  and  than  the  system  verifies 
"it  is /it  is  not"  the  real  user. 

In  Voice  Identification  the  entire  data  base  is 
searched  to  try  to  identify  the  speaker.  This  is  much  harder, 
because  the  user  simply  asks  to  be  identified.  The  process  may 
take  longer,  but  the  advantage  is  that  the  user  need  not 
remember  or  enter  any  password  or  PIN  (Poock, 1984) . [Ref .  3] 

B.  VERBEX  SERIES  5000  VERSION  3.00  VOICE  RECOGNIZER 

The  VERBEX  Voice  I/O  System  is  a  computer  peripheral  that 
allows  users  to  send  data  to  computers  by  voice.  In  many 
computer  applications,  the  Recognizer  will  work  as  a 
replacement  for  a  keyboard,  leaving  the  operator's  hands  and 
eyes  free  (VERBEX  Grammar  Development  Manual,  1990) . [Ref.  4] 
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In  order  to  understand  what  the  VERBEX  Recognizer  does, 
think  of  the  Recognizer  as  a  translator  which  performs  much 
the  same  function  for  a  Host  computer  as  a  human  translator 
does  for  a  foreigner.  The  Recognizer  translates  words  spoken 
through  the  headset  microphone  into  computer  code  the  Host 
computer  can  understand,  and  then  sends  this  information  to 
the  computer.  This  process  is  called  Recognition. 

Understanding  and  translating  spoken  language  into  digital 
information  is  a  truly  monumental  feat  for  any  machine.  To 
accomplish  this  task,  the  Recognizer  contains  sophisticated 
computers  of  its  own.  Before  the  Recognizer  can  begin 
recognition,  it's  internal  processors  must  be  given  two  banks 
(files)  of  information: 

1 .  Recognizer  File 

This  Recognizer  File  contains  the  following: 

a.  A  list  of  the  words  the  user  is  going  to  say  during  the 
application  (a  Vocabulary) . 

b.  Simple  rules  about  the  orders  and  patterns  in  which  these 
words  may  be  spoken  (a  Grammar) . 

c.  A  table  of  computer  codes  for  each  word  (the  Translation 
Table) .  Grammar  definition  sections  specify  what 
statements  the  Voice  I/O  System  can  recognize,  but  not 
what  the  system  should  send  to  the  host  computer  in 
response  to  each  recognition.  By  adding  a  translation 
table  to  the  grammar-definition  file,  the  designer  can 
specify  the  content  of  a  message  the  Voice  I/O  System 
will  send  to  the  host  computer  when  a  valid  statement  has 
been  recognized.  For  example  if  the  word  MISSILE  which  is 
included  into  an  utterance  is  recognized,  and  we  want 
only  the  letter  M  to  appear  on  the  screen  we  type  the 
following  format  of  a  translation  table,  for  an 
application  containing  a  single  grammar: 


! grammar_name= 

♦recognition 

♦grammar 

♦translations 

Cinitiator 

| separator 

>terminator 

MISSILE  M| 015. 

d.  Voice  Response  information  which  are  words  and  sentences 
to  be  spoken  by  the  Recognizers  internal  speech 
synthesizer  (the  Voice  Response  Facility)  when  certain 
statements  are  recognized. 

e.  A  list  of  values  for  certain  internal  parameters  which 
the  user  can  set  in  the  Recognizer  File  (called  a  Setup 
Block)  .  When  the  Recognizer  File  containing  a  Setup  Block 
is  accessed  by  the  Recognizer,  all  Setup  Parameters  are 
set;  those  listed  in  the  Setup  Block  are  set  to  the 
values  listed,  and  all  those  not  listed  in  the  Setup 
Block  are  set  to  their  default  values.  For  example  we 
want  to  set  the  voice_settings_speed  parameter  from  5 
which  is  the  default  value  to  6  (allowed  values  0  to 
9)  and  the  voice_settings_jpitch  from  5  to  7  (allowed 
values  0  to  9) .  The  Setup  Block  will  look  like  this: 

♦set  up 

voice_settings_speed  =  6 

voice_settings_pitch  =  7 

2 .  Voice  File 

The  Voice  File  contains  a  library  of  sound  patterns 
for  all  the  words  in  the  Recognizer  File;  both  as  they  sound 
when  they  are  spoken  individually  ("discretely")  and  as  they 
sound  when  they  are  spoken  together  ("continuously")  in  the 
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orders  and  patterns  set  forth  in  the  Grammar  in  the  Recognizer 
File. 

Only  when  the  Recognizer  has  both  a  Recognizer  File  and  a 
Voice  File  can  it  begin  recognition. 

Once  those  files  are  created,  the  user  must  transfer  the 
Recognizer  File  into  one  of  the  sections,  or  images,  in  the 
Recognizer's  memory.  Once  there,  it  can  be  stored  on  a  voice 
cartridge . 

Each  user  must  train  the  Recognizer  using  the  sound  of 
his/her  voice.  The  sound  patterns  ("Voice  File”)  created 
through  Training  can  then  be  stored  on  the  Voice  Cartridge 
along  with  the  Recognizer  File. 

After  the  files  are  stored  on  a  voice  cartridge,  using  the 
system  becomes  a  one-step  process.  Whenever  the  Recognizer  is 
to  be  used,  the  user  need  only  switch  it  on  and  insert  the 
voice  cartridge  into  the  cartridge  bay.  The  voice  cartridge 
contains  the  Recognizer  File  and  the  operator's  Voice  File. 

The  Recognizer  then  automatically  begins  listening  and 
translating  what  it  hears  (VERBEX  Installation  Manual,  1990) . 

[Ref.  5] 

C.  DESCRIPTION  07  THE  PROBLEM 

It  is  a  reality  today  that  Naval  Operations  are  done  with 
the  use  of  specially  trained  personnel  due  to  the  complexity 
of  the  existing  systems  such  as  Tactical  Tables  for  Detection, 
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Tracking,  Recognition,  and  Establishing  Targeting  Priorities 
for  the  available  onboard  weapons  systems . 

These  persons  have  to  work  under  difficult  conditions  in 
CIC's  (Combat  Information  Centers)  which  are  usually  confined, 
dimly  lit  rooms.  It  is  necessary  to  emphasize  that  under 
difficult  situations  such  as  rough  seas,  the  abilities  of  any 
human  are  limited  to  performing  only  essential  functions  in 
order  to  overcome  the  difficulties  that  he/she  is  facing.  This 
includes  every  undesirable  movement  into  the  operational  area 
or  even  any  repetition  of  an  order. 

Under  these  situations  the  CO's  ,  XO's,  or  ever  OPS 
Officers/CIC  Officers  who  are  responsible  for  making  the 
evaluation  of  a  combination  of  different  information  passed  to 
them  by  different  sources,  may  mentally  lose  a  number  of  their 
operational  personnel . 

The  situation  is  further  complicated  given  the  fact  that 
after  the  evaluation  process  they  have  to  give  an  order  which 
is  required  to  be  executed  immediately,  and  their  personnel 
does  not  respond  fast  enough. 

Even  with  the  most  trained  personnel  the  delay  times 
cannot  be  eliminated.  The  need  of  the  undesirable  in  any  case, 
but  impossible  to  avoid,  "middle  man"  generates  a  lot  of 
misinterpretation  problems  which  in  real  time  situations  may 
be  crucial  or  even  fatal. 

Sometimes,  because  the  CO,  XO,  or  the  OPS/CIC  Officer, 
away  from  their  positions,  are  unable  to  enter  data,  or  their 
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hands  are  busy  and  cannot  be  used  at  the  specific  needed  time; 
this  delay  may  give  the  enemy  the  tactical  advantage. 

It  is  known  that  the  final  decision  is  always  left  to  the 
CO  of  every  unit;  sometimes  this  same  person  has  to  be  on  the 
bridge  instead  of  being  in  the  CIC  and,  in  an  emergency  case 
he  cannot  order  immediate  action  without  the  need  to  use  one 
or  more  "middle  men". 

It  is  obvious  that  the  Navies  in  the  whole  world  are 
facing  the  problem  of  trying  to  obtain  more  "real-time" 
systems,  but  instead  are  operating  them  with  the  use  of  the 
"middle  man"  as  the  interface  between  the  decision  maker  and 
the  computing  system. 

Using  Speech  Recognition  as  that  interface  will  make  the 
computer  a  true  extension  of  man,  solving  many  problems 
created  by  the  need  of  using  our  hands  and  eyes  in  low  light 
conditions  with  distance  limitations. 

With  the  use  of  this  new  technology  such  as  the  VERBEX 
Series  5000  Version  3.00,  we  would  not  only  give  operators  the 
advantage  of  being  able  to  transmit  information  and 
operational  decisions  safely  and  timely  but  also  will  reduce 
the  number  of  personnel  needed  to  run  the  available  equipment 
and  thus  reduce  training  hours  in  favor  of  other  activities. 

The  different  features  offered  by  these  devices  offer  the 
security  level  needed  to  limit  the  number  of  the  operating 
personnel  with  the  use  of  the  Speech  Recognition  System  to 
just  the  essential  ones. 
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III.  OBJECTIVE 


The  objective  of  this  thesis  is  to  investigate  the 
possible  use  of  a  Continuous  Voice  Recognition  System  (in  this 
case  a  VERBEX  Series  5000  Version  3.00)  for  inputting 
operational  information  to  Tactical  Tables  in  Naval 
Operations . 

A  secondary  objective  is  measuring  the  accuracy  of  the 
system,  how  much  it  may  influence  our  reaction  time  and,  how 
credible  this  system  is  against  the  procedures  currently  being 
used  for  Tactical  Table  procedures  in  Naval  Operations. 
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IV.  DESCRIPTION  OF  THE  SIMULATION 


A  C.I.C.  room  includes  in  its  systems,  two  Tactical  Tables 
which  do  not  communicate  with  each  other.  The  purpose  of 
Tactical  Table  number  one  is  for  assigning  guns  and  tozpedoes 
only  to  the  targets  we  want  to  attack,  and  also  has  as 
supplementary  features  to  detect,  track,  characterize  the 
different  targets,  and  to  plot  their  rackets.  Tactical  Table 
number  two  has  the  purpose  of  detecting,  tracking,  and 
assigning  only  missiles  to  the  designated  targets. 

The  two  Tactical  Tables  are  distant  from  each  other  as 
shown  in  Figure  1;  so  to  have  the  picture  of  both  screens 
(P.P.I.'S)  the  CO,  XO,  or  the  OPS  Officer  has  to  move,  in 
order  to  insure  that  exactly  the  same  information  has  been 
entered  in  both  systems. 

To  enter  any  information  in  either  of  the  two  Tactical 
Tables,  the  operator  has  to  use  his  left  hand  for  the 
keyboard,  and  the  his  right  hand  for  the  rolling  ball  which 
moves  the  targeting  cross  giving  at  any  time,  the  bearing  and 
range  of  its  position  on  the  screen  relatively  from  the  ship. 

Operations  in  the  C.I.C.  are  always  done  in  the  dark  and 
the  use  of  eyes  as  well  of  hands  becomes  more  difficult, 
especially  when  we  want  to  observe  specific  buttons  having 
special  features  at  the  specific  time  we  need  them. 
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The  location  of  the  C.I.C.  varies  from  type  to  type  of 
ships  and  the  most  of  the  time  is  located  far  from  the  bridge, 
thus  hampering  the  movements  of  the  CO  when  he  is  needed  in 
another  location. 

Here  comes  the  role  of  the  "middle  man"  or  the  role  of  the 
intercom  system,  which  provides  so  many  misinterpretations  in 
the  voice  commands  of  the  decision  maker  when  delegating 
commands  to  the  operational  center. 

A.  PROCEDURE  FOR  THIS  EXPERIMENT 

To  simulate  the  two  Tactical  Tables,  two  personal 
computers  (PC's)  were  connected  with  a  T-Connector  to  the 
VERBEX  Series  5000  Version  3.00  as  shown  in  Figure  2.  The  two 
PC' s  were  four  meters  apart  from  each  other  like  in  the  real 
C  I  C  environment. 

PC-1  had  the  ability  to  communicate  with  the  VERBEX  Voice 
System  and  perform  all  the  features  this  system  offers;  PC-2 
was  used  as  a  monitor. 

The  voice  application  as  a  text  file  on  disk,  called  a 
grammar  file  represented  the  different  functions  of  the 
Tactical  Tables.  The  grammar  file  is  created  with  the  use  of 
a  standard  text  editor. 

A  unique  grammar  file  is  necessary  for  each  application  of 
a  Speech  Recognition  System,  the  contents  of  which  are  the 
data  necessary  to  operate  the  application  program. 
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The  grammar  file  anatomy  developed  for  this  application  is 
shown  in  Figure  3 . 

Each  function  of  the  Tactical  Tables  represented  an 
utterance  in  the  grammar,  and  with  the  use  of  the  CONVERT 
software,  this  text  file  was  converted  into  a  binary,  machine- 
readable  recognizer  file  suitable  for  downloading  to  a  VERBEX 
Recognizer . 

This  recognizer  file  was  transferred  with  the  use  of  a 
file  transferring  tool  named  TRANSFER,  to  the  VERBEX  Voice 
cartridge. 

From  this  moment  on  a  user' s  voice  patterns  could  be  added 
to  the  cartridge  through  the  TRAINING  process. 

Trying  to  represent  the  functions  of  a  Tactical  Table,  as 
realistically  as  possible,  the  author  simulated  the  movements 
of  the  rolling  ball  with  spaces  in  the  scenario  used  for  the 
typing  part  of  the  experiment. 

The  experiment  differs  from  reality  in  the  existing 
darkness  level,  because  it  was  done  with  day  light  instead  of 
the  existing  darkness  in  the  ship's  C.I.C.  Therefore  any 
results  obtained  from  the  typing  input  condition  of  this 
experiment  would  only  be  expected  to  be  worse  in  the  darker 
conditions  found  in  a  C.I.C.  in  real  world  operations  of  a 
ship. 
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rigur*  3:  Flow  Chart. 
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V.  RESEARCH  SCENARIO 

This  research  design  was  created  to  simulate  reality  as 
close  as  possible  for  each  case. 

As  mentioned  before,  for  this  experiment  the  different 
subjects  followed  a  fixed  scenario  which  comprised  two 
different  parts;  a  written  part,  and  an  oral  one. 

The  purpose  for  having  two  parts  was  to  illustrate  the 
difference  between  the  time  needed  to  perform  a  specific  task 
using  the  VERBEX  Series  5000  Version  3.00  as  an  input  mode  to 
the  Tactical  Tables,  and  the  original  manual  input  mode. 

In  the  scenario,  we  are  operating  in  the  central  Aegean 
Sea,  an  Archipelagos  which  make  up  2,463  of  Greece's  total  of 
3,100  islands.  The  Aegean  Sea  as  an  entity,  together  with  the 
Greek  mainland  and  the  mosaic  of  the  islands  which  it 
includes,  is  an  area  of  vital  strategic  importance  and  is 
absolutely  necessary  for  the  defense  of  Greece.  This  is 
because  it  provides  the  strategic  and  tactical  depth  required 
for  maneuvers  and  support  of  military  operations  and  ensures 
the  strategic  warning  against  a  preemptive  massive  air  attack. 

This  area  also  constitutes  successive  defense  barriers  in 
depth  since  it  is  an  extension  of  the  Dardanelles  Straits  and 
provides,  to  whoever  holds  it,  the  capability  to  control  the 
sea  exits  through  the  Aegean  to  the  Mediterranean.  Moreover, 
in  conjunction  with  the  island  of  Crete,  it  provides  for  the 
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full  control  of  the  southeastern  Mediterranean  region  (THREAT 
in  the  Agean, 1987) . [Ref .  6] 

In  the  scenario,  the  subjects  are  on  board  a  Hellenic  Navy 
FPBGMT  (Fast  Patrol  Boat  with  Guns,  Missiles,  and  Torpedoes) 
and  the  tactical  situation  is  shown  in  Figure  4. 

Every  target  which  appears  on  Figure  4  is  used  the  Speech 
Recognition  and  in  the  manual  typing  part  of  the  scenario. 

The  experimental  design  for  the  manual  part  was  formulated 
to  simulate,  as  real  as  possible,  the  movements  made  by  the 
operator  of  the  Tactical  Table  when  working  manually;  and  the 
provided  spaces  were  to  simulate  the  delay  times  due  to  the 
use  of  the  rolling  ball  with  the  right  hand,  which  controls 
the  position  of  the  cursor. 

The  experimental  design  for  the  Speech  Recognition 
condition  made  use  of  standard  Naval  terminology  to  create  the 
utterances  in  the  grammar  file.  This  permitted  subjects  to  use 
familiar  terms  and  perform  the  task  with  minimal  trainning. 

The  voice  orders  for  the  Speech  Recognition  part  of  the 
experiment  with  the  use  of  VERBEX  Series  5000  Version  3.00 
were  the  following: 

Say:  Target  030  Tack  20  Point  0  Designated  01 
Say:  Target  060  Tack  20  Point  0  Designated  02 
Say:  Target  090  Tack  25  Point  0  Designated  03 
Say:  Target  120  Tack  15  Point  0  Designated  04 
Say:  Target  150  Tack  20  Point  0  Designated  05 
Say:  Target  180  Tack  10  Point  0  Designated  06 
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330 


Figure  4:  P.P.I.  Display  of  the  scenario  used. 
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Say:  Target  240  Tack  25  Point  0  Designated  07 

Say:  Target  270  Tack  20  Point  0  Designated  08 

Say:  Target  300  Tack  25  Point  0  Designated  09 

Say:  Target  330  Tack  25  Point  0  Designated  10 

Say:  Target  01  Display  Surface 

Say:  Target  02  Display  Surface 

Say:  Target  03  Display  Surface 

Say:  Target  04  Display  Surface 

Say:  Target  05  Display  Surface 

Say:  Target  06  Display  Surface 

Say:  Target  07  Display  Air 

Say:  Target  08  Display  Air 

Say:  Target  09  Display  Surface 

Say:  Target  10  Display  Surface 

Say:  Target  01  Friendly 

Say:  Target  02  Friendly 

Say:  Target  03  Neutral 

Say:  Target  04  Hostile 

Say:  Target  05  Neutral 

Say:  Target  06  Hostile 

Say:  Target  07  Hostile 

Say:  Target  08  Hostile 

Say:  Target  09  Neutral 

Say:  Target  10  Neutral 

Say:  Plot  Target  04 

Say:  Plot  Target  06 


23 


Say:  Plot  Target  07 

Say:  Plot  Target  08 

Say:  Plot  Racket  060 

Say:  Plot  Racket  120 

Say:  Plot  Racket  240 

Say:  Plot  Racket  270 

Say:  Assign  Target  04  Gun 

Say:  Assign  Target  06  Torpedo 

Say:  Assign  Target  07  Missile 

Say:  Assign  Target  04  Missile 

The  manual  inputs  for  the  experiment  were  the  following 


Type: 

030-20.0 

01 

Type: 

060-20.0 

02 

Type: 

090-25.0 

03 

Type: 

120-15.0 

04 

Type: 

150-20.0 

05 

Type: 

180-10.0 

06 

Type : 

240-25.0 

07 

Type: 

270-20.0 

08 

Type : 

300-25.0 

09 

Type: 

330-25.0 

10 

Type: 

01  Surface 

Type: 

02  Surface 

Type: 

03  Surface 

Type: 

04  Surface 

Type : 

05  Surface 
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Type : 

06 

Surface 

Type: 

07 

Air 

Type: 

08 

Air 

Type: 

09 

Surface 

Type: 

10 

Surface 

Type: 

01 

Friendly 

Type: 

02 

Friendly 

Type : 

03 

Neutral 

Type: 

04 

Hostile 

Type: 

05 

Neutral 

Type: 

06 

Hostile 

Type: 

07 

Hostile 

Type: 

08 

Hostile 

Type: 

09 

Neutral 

Type: 

10 

Neutral 

Type : 

04 

P 

Type: 

06 

P 

Type: 

07 

P 

Type: 

08 

P 

Type: 

060  R 

Type  : 

120  R 

Type : 

240  R 

Type: 

270  R 

Type: 

04 

G 

Type: 

06 

T 

Type: 

07 

M 

Type:  08  M 

Type:  04  M 

The  words  Say,  and  Type  of  the  above  orders  were  not  to  be 
said  or  typed;  they  were  written  in  order  to  help  the 
individuals  to  understand  the  beginning  of  a  new  utterance  in 
each  case  respectively. 

In  other  words,  the  subject  when  he/she  said:  "Target  030 
Tack  20  Point  0  Designated  01"  during  the  Speech  Recognition 
part  as  well  as  he/she  typed:  "030-20.0  01"  in  the  manual 
typing  part,  he/she  will  get  the  same  result  on  the  Tactical 
Table  which  is  "01"  on  the  upper  right  hand  quarter  of  the 
cursor's  position  on  the  P.P.I. 
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VI.  EXPERIMENTAL  DESIGN 


The  experiment  was  run  from  12.00  to  14.00  so  the 
individuals  used  were  not  rested  after  their  daily  schedule  of 
classes . 

Subjects  individually  met  with  the  experimenter  initially 
and  were  told  about  the  basic  ideas  of  how  the  voice 
recognition  equipment  worked,  what  they  were  expected  to  say, 
and  how  the  training  on  the  equipment  would  be  performed.  At 
the  same  time,  they  were  informed  about  the  simulation  and  the 
goal  of  the  experiment  in  order  to  have  a  complete 
understanding  of  what  they  were  to  do. 

After  the  experiment,  they  were  given  a  subjective 
questionnaire  regarding  their  opinions  about  using  voice  input 
versus  manual  typing  input  for  the  use  in  Tactical  Tables. 

Each  subject  trained  the  system  using  the  Emulate  software 
utility  (which  makes  the  Host  Computer  connected  to  the 
Host/Computer  port  of  the  VERBEX  Recognizer  function  as  though 
it  were  an  ASCII  terminal  connected  to  the  User /Terminal 
port)  .  As  part  of  this  process,  they  accessed  the  Recognizer's 
Setup  Mode  menus  and  Trained  the  system  with  their  own  voice 
patterns . 

The  Recognizer  first  gave  them  the  words  and  then  the 
utterances /phrases  that  were  used  in  the  experiment.  In  this 
way  each  person  "taught"  the  Voice  Recognition  System  the 
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sound  of  his/her  voice  speaking  words  and  sample  phrases  in 
the  Recognizer  File. 

These  sound  patterns  the  Recognizer  learns  from  each 
subject  were  then  stored  along  with  the  Recognizer  File  on  a 
Voice  Cartridge  (VERBEX  Software  Utilities  Manual,  1990) . 
[Ref.  7] 

Once  each  subject  had  stored  his  voice  patterns  on  the 
cartridge,  he/she  then  performed  the  oral  part  of  the 
experiment  by  going  through  the  voice  order  list  mentioned  in 
chapter  V. 

Every  time  the  subject  spoke  into  the  microphone,  the 
spoken  utterance  was  displayed  on  both  screens  -  representing 
the  two  Tactical  Tables  -  exactly  as  they  appeared  in  the 
grammar  definition  lines.  In  other  words,  the  utterances 
appeared  different  on  the  screens,  but  showed  up  exactly  as  it 
would  in  the  real  case  on  the  P.P.I.'s  of  the  Tactical  Tables. 

The  start  of  reading  the  list  coincided  with  the  start  of 
the  chronometer  which  was  stopped  when  the  subject  ended  the 
list . 

The  subject  had  to  enter  manually  one  by  one  every  line 
contained  in  the  typing  list  -  as  mentioned  in  chapter  V.  - 
first  in  PC-1  and  then  in  PC-2  which  was  four  meters  distant 
from  the  first  one.  The  experimenter  started  the  chronometer 
which  lasted  as  long  as  the  individual  was  running  the  typing 
test.  Finally,  every  individual  answered  the  questionnaire. 
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The  scenario  was  performed  one  time  by  each  subject  using 
the  voice  input,  and  one  time  using  the  manual  typing  input. 

The  concept  was  to  give  the  minimum  opportunities  to  the 
individuals  to  train  the  system,  in  order  to  get  more 
objective  results  from  the  point  of  view  that  we  can  minimize 
our  training  time  in  favor  of  other  activities  without  risking 
the  level  of  accuracy  of  the  system. 

If  under  these  circumstances  the  results  were  positive, 
then  it  is  obvious  that  the  system  is  not  only  faster,  but 
more  reliable  too. 

The  total  number  of  individuals  used  for  the  experiment 
were  ten;  nine  men  and  one  woman;  and  all  of  them  were 
officers  of  the  U.S.N.  and  U.S.C.G. 
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VII.  DEPENDENT  VARIABLES 

During  all  trials,  the  following  were  measured: 

1)  Time  to  complete  the  scenario  using  the  VERBEX  Series 
50000  Version  3.00  Voice  Recognizer  as  an  input  mode. 

2)  Time  to  complete  the  scenario  with  manual  input  mode. 

3)  Number  of  oral  input  command  errors. 

4)  Number  of  manual  input  command  errors. 

5)  Time  to  complete  every  utterance  included  in  the 

grammar  file  using  the  oral  mode. 

6)  Time  to  complete  every  utterance  included  in  the 

grammar  file  using  the  manual  mode. 

The  author  was  interested  in  the  number  of  times  the 
computers  were  instructed  to  do  something  wrong.  Therefore,  on 
typing  "Target"  for  example,  if  the  command  was  typed  in 
wrong,  it  was  counted  as  one  error,  whether  there  was  one  or 
several  actual  keystrokes  typed  wrong.  Similarly  for  voice 
input,  if  a  subject  spoke  the  wrong  scenario  command,  the 
Voice  Recognizer  may  have  recognized  the  voice  input 
correctly,  but  it  would  be  a  wrong  command  to  the  Tactical 
Tables  it  was  counted  as  an  error  (Poock,  1980) . 

[Ref.  8]  Another  error  was  considered  if  the  subject 
erroneously  pronounced  an  utterance  and  the  Recognizer  either 
didn't  reply,  or  answered  with  a  wrong  command.  In  other 
words,  the  author  was  interested  in  a  detailed  analysis  of  how 
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many  times  one  voice  utterance  might  get  confused  with 
another,  because  in  a  real  situation  the  operator  might  not 
have  the  luxury  to  afford  any  mistake. 

All  the  data  were  selected  before  the  individuals  answered 
the  questionnaire.  These  questions  can  be  found  in  Appendix  I. 
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VIII.  RESULTS 


A.  RESULTS  FOR  SCENARIO  TIMES 

Table  I  below  shows  the  time  taken  to  perform  the  set  of 
actions  in  the  scenario  for  every  individual  when  using  the 
Voice  Recognizer  and  the  equivalent  times  when  performing  the 
experiment  with  Manual  input . 

Table  I.  Times  for  Oral  and  Manual  Input  Method. 


Individual  # 

Oral 

Input  Mode 

Manual  Input 

Mode 

min 

sec 

msec 

min 

sec 

msec 

1st 

02 

41 

990 

08 

31 

660 

2nd 

02 

43 

020 

09 

43 

020 

3rd 

03 

24 

380 

08 

02 

020 

4th 

02 

53 

920 

10 

31 

850 

5th 

02 

30 

660 

10 

45 

110 

6th 

02 

39 

930 

11 

08 

060 

7th 

03 

43 

980 

10 

50 

730 

8th 

03 

12 

720 

08 

56 

100 

9th 

02 

57 

700 

09 

18 

770 

10th 

02 

43 

- - 

520 

11 

02 

860 

Mean  Value 

02 

57 

182 

09 

53 

018 

As  can  be  seen  in  Table  I,  voice  input  was  consistently 
faster  than  manual  typing  input  by  3.33  times. 
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The  mean  value  for  the  time  needed  to  perform  the  scenario 
with  voice  input  is  2.95  min  ,  versus  of  the  9.88  min  needed 
to  perform  the  same  scenario  with  manual  input. 

This  is  a  statistically  significant  difference  in  favor  of 
voice  input  and  this  becomes  even  more  important  when  we  take 
into  account  that  the  subjects  had  only  used  voice  input  for 
no  more  than  an  hour  and  a  half. 

There  was  an  improvement  in  time  -  seven  (7)  out  of  ten 
(10)  times  -  when  performing  a  second  pass  immediately  after 
the  first  one,  with  the  rest  of  the  cases  -  three  (3)  out  of 
ten (10)  -  deteriorating  by  less  than  20  sec  compared  to  their 
first  pass. 

On  the  other  hand  there  was  no  difference  in  typing 
ability  with  respect  to  times. 

B.  RESULTS  FOR  ERRORS 

Figure  5  illustrates  the  errors  input  to  the  system  with 
Oral  and  Manual  Mode .  One  can  see  the  voice  input  oral  method 
consistently  produced  fewer  errors. 

C.  TIMING  RESULTS  ON  INDIVIDUAL  UTTERANCES 

Table  II  below  shows  the  times  taken  for  each  individual 
to  perform  every  single  utterance  included  in  the  grammar  file 
with  Oral  Mode  and  Manual  Mode  respectively. 

This  is  done  because  in  reality  the  operator  does  not 
always  have  to  enter  so  much  operational  information  to  the 
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Tactical  Tables  and  in  this  way  we  have  a  more  global  idea  of 
the  timing  performance  of  the  compared  modes. 


Table  II.  Timing  results  on  individual  utterances. 


Subject  # 

Utterance  # 

Oral 

Mode 

Manual  Mode 

sec 

msec 

sec 

msec 

1st 

1st 

05 

690 

10 

070 

2nd 

03 

370 

08 

590 

3rd 

02 

680 

08 

790 

4th 

02 

300 

04 

610 

5th 

02 

950 

05 

790 

6th 

02 

890 

04 

820 

2nd 

1st 

07 

010 

12 

870 

2nd 

03 

600 

10 

210 

3rd 

02 

960 

10 

950 

4th 

02 

690 

05 

780 

5th 

03 

150 

06 

890 

6th 

03 

120 

05 

930 
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3rd 

1st 

06 

740 

15 

910 

2nd 

03 

820 

08 

050 

3rd 

03 

310 

08 

640 

4th 

03 

090 

05 

340 

5th 

03 

620 

06 

690 

6th 

03 

810 

04 

810 

4th 

1st 

05 

330 

18 

610 

2nd 

03 

380 

14 

140 

3rd 

02 

620 

13 

870 

4th 

02 

690 

06 

450 

5th 

02 

990 

08 

860 

6th 

03 

160 

07 

010 

5th 

1st 

07 

020 

18 

330 

2nd 

03 

700 

15 

140 

3rd 

02 

570 

12 

870 

4th 

02 

890 

06 

450 

5th 

03 

250 

09 

330 

6th 

03 

110 

07 

100 

36 
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10th 

1st 

06 

560 

14 

720 

2nd 

03 

410 

11 

160 

3rd 

02 

590 

12 

050 

4th 

02 

630 

06 

630 

5th 

03 

070 

09 

030 

6th 

03 

360 

07 

45 

To  clarify  the  meaning  of  the  "utterance  #"  which  is 
comprise  into  the  Table  II,  the  1st  utterance  for  the  Oral 
Mode  is  of  the  type  TARGET  .DIGIT  81,3  TACK  .DIGIT  02  POINT 
.DIGIT  DESIGNATED  .DIGIT  02,  and  .DIGIT  @3  -  .DIGIT  @2  . 
.DIGIT  . DIGIT02  for  the  Manual  Mode. 

Respectively  the  2nd  utterance  is  of  the  type  TARGET 
.DIGIT  @2  DISPLAY  .TYPE  for  the  Oral  Mode  and  .DIGIT  @2  .TYPE 
for  the  Manul  Mode.  The  3rd  utterance  is  of  the  type  TARGET 
.DIGIT  @2  .IDENTIFY  for  the  Oral  Mode  and  .DIGIT  @2  .IDENTIFY 
for  the  Manual  Mode. 

The  4th  utterance  is  of  the  type  PLOT  TARGET  .DIGIT  @2 
for  the  Oral  Mode  and  .DIGIT  @2  P  for  the  Manual  Mode.  The  5th 
utterance  is  of  the  type  PLOT  RACKET  .DIGIT  @3  for  the  Oral 
Mode  and  .DIGIT  @3  R  for  the  Manual  Mode. 

Finally,  the  6th  utterance  is  of  the  type  ASSIGN  TARGET 
.DIGIT  @2  .WEAPON  for  the  Oral  Mode  and  .DIGIT  @2  G/T/M  for 
the  Manual  Mode. 
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D.  SUBJECTIVE  QUESTIONNAIRE  RESULTS 

As  mentioned  above,  after  completing  their  performance 
with  the  Oral  and  Written  part  of  the  experimental  scenario, 
subjects  filled  out  a  questionnaire  with  the  following 
results : 

(1)  .  When  asked  if  they  had  any  familiarity  with 
Tactical  Data  Systems,  fifty  percent  answered  NO. 

(2)  .  When  asked  if  they  had  any  familiarity  with  Voice 
Recognition  Systems,  eighty  percent  of  them  answered  YES. 
Their  familiarity  was  a  result  of  the  OS-3404  course  they  have 
taken  in  the  N.P.S.  during  their  coursework  to  fulfill  the 
requirements  necessary  for  a  Master  of  Science  in 
Telecommunications  System  Management. 

(3)  .  When  asked  if  they  believed  that  a  Continuous  Voice 
Recognition  System  would  be  useful  as  a  method  of  Voice  Input 
versus  Manual  keying,  all  of  them  answered  positively,  saying 
also  that  the  operator  is  faster,  has  his  hands  free, and  he 
can  concentrate  better  on  his/her  job. 

(4)  .  When  asked  if  they  believed  that  a  Voice 
Recognition  System  as  an  Input  tool  for  a  Tactical  Data  System 
would  help  increase  the  combat  reaction  time  available  to  the 
Tactical  Commander,  all  of  the  responses  were  positive. 
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IX.  OTHER  OBSERVATIONS 

A.  Several  subjects  mentioned  that  with  the  Voice  Input 
they  felt  they  had  better  control  of  the  situation.  They  had 
their  hands  free  and  did  not  need  to  concentrate  on  typing  the 
correct  command  rather  than  observing  what  they  were  doing. 

B.  After  the  fifth  individual  and  during  the  execution 
of  the  experiment,  the  author  noticed  an  increasing  number  of 
misrecognitions  (not  errors)  with  the  Voice  Input  Mode.  After 
cleaning  the  contacts  (pins)  of  the  Cartridge  the  number  of 
misrecognitions  had  dropped  to  zero  for  the  following  two 
experiments . 

C.  Instead  of  the  standalone  (independently  housed)  voice 
processor  unit  of  the  VERBEX  5000,  it  would  be  desirable  to 
build  the  processor  into  the  Tactical  Table.  This  is 
technologically  feasible,  since  voice  processor  add-in  boards 
are  currently  available  for  the  IBM  PC  and  other 
microcomputers.  All  that  needs  to  be  done  is  to  interface  the 
voice  processor  circuitry  to  the  Tactical  Table.  This  would 
save  space  as  well  as  eliminating  the  need  to  check  the 
Cartridge  to  ensure  it  is  in  the  right  position  in  case  of 
rough  seas.  It  is  important  to  note  that  military 
specifications  are  beyond  of  what  is  offered  in  this  version 
of  the  VERBEX  5000.  Another  point  equally  important  to  mention 
is  that  it  would  be  more  realistic  if  the  Cartridge  could 
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store  the  voice  patterns  of  all  operators  so  that  it  did  not 
need  to  be  re-initialized  each  time  a  different  user  came  on 
watch  during  real  operations . 

D.  Speech  Recognition  Systems,  while  they  generally  do 
not  affect  the  internal  workings  of  the  computer,  present 
their  greatest  potential  advantage  in  increasing  the 
efficiency  of  the  total  human /machine  system.  Computer  input 
by  voice  allows  those  who  are  not  familiar  or  comfortable  with 
the  computer  to  comprehend  and  follow  what  the  operator  may 
see  unfolding  at  his  work  station,  without  requiring  lengthy 
and  distracting  explanations  from  the  operator  (French,  1983)  . 
[Ref.  9] . 

During  the  experiments  it  was  noticed  that  the  subjects 
exhibited  signs  of  stress,  even  though  the  author  didn't 
impose  on  them  any  kind  of  stress  by  limiting  the  time  to 
perform  the  experiment.  Another  factor  which  appeared  to 
induce  stress  upon  the  subjects  was  the  presence  of  the  author 
during  the  experiment . 

Upon  completion  of  the  execution  of  the  Voice  Input  Mode 
most  of  the  subjects  complained  that  they  were  not  very 
familiar  with  typing.  They  also  inquired  as  to  how  well  the 
other  subjects  were  performing  the  Manual  Input  Mode 
exercises . 

When  using  the  Voice  Recognition  Mode,  subjects  with 
symptoms  of  stress  appeared  to  talk  in  longer  bursts,  with 
shorter  pauses  separating  the  bursts.  Because  psychological 
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stress  has  a  definite  effect  on  the  voice,  it  is  reasonable  to 
expect  it  to  have  a  negative  effect  on  the  success  rates  of 
users  of  voice  input  equipment  (French,  1983) . [Ref.  9] 

E.  Experimenting  further  with  one  subject,  the  author 
noticed  that  when  working  with  only  one  Tactical  Table  -  one 
PC  in  our  case  -  and  using  the  longest  of  the  Speech 
Recognition  utterances,  the  time  needed  to  perform  the  task 
was  5  sec  950  msec.  The  time  for  the  same  subject  to  perform 
the  same  utterance  in  the  manual  typing  mode  was  measured  to 
be  slightly  faster,  5  seconds  020  milli-seconds  when  he  knew 
from  the  beginning  what  he  was  to  type.  On  the  other  hand  when 
he  didn't  know  the  exact  content  of  the  utterance,  the  time  he 
took  to  perform  the  same  task  manually  was  7  seconds  740 
milli-seconds . 


( 
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X.  CONCLUSIONS 


In  summary,  it  can  be  said  that  the  use  of  a  Continuous 
Voice  Recognition  System  for  inputting  operational  information 
to  Tactical  Tables  in  Naval  Operations  is  feasible.  For  the 
experiment,  results  show  that  it  is  faster  -  by  3.3  times  - 
than  the  manual  typing  input  and  at  the  same  time  more 
accurate  by  5.69  times.  In  this  way  operators  can  decrease 
their  reaction  time  in  a  real  situation  with  minimal  possible 
errors . 

Operators  can  decrease  their  dependence  on  factors  such  as 
darkness  and  available  space  to  move  to  enter  data,  as  well 
minimize  their  dependence  on  the  cooperation  of  others.  During 
peace  time  supervisors  can  reduce  the  operators'  training 
hours  in  favor  of  other  activities. 

On  the  other  hand,  to  increase  the  credibility  of  such  a 
system  it  will  be  necessary  to  create  a  recognizer  file  which 
is  able  to  provide  a  feedback  response  back  to  the  user  before 
the  execution  of  crucial  orders  such  as  to  fire  a  missile  on 
a  previously  defined  target  before  the  launch.  This  of  course 
could  further  decrease  the  reaction  time  of  the  system. 
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APPENDIX  A 


EXPERIMENTAL  DATA  SHEET  (QUESTIONNAIRE! 

1.  M _  F _ 

2.  AGE _ 

3 .  SERVICE _ 

4.  FAMILIARITY  WITH  TACTICAL  DATA  SYSTEMS  Y _ N _ 

IF  YES,  WHICH  ONE ( S )  ? 

5.  FAMILIARITY  WITH  VOICE  RECOGNITION  SYSTEMS  Y _  N _ 

IF  YES,  WHICH  ONE (S)  ? 

6-  DO  YOU  BELIEVE  THAT  A  CONTINUOUS  VOICE  RECOGNITION 
SYSTEM  WOULD  BE  USEFUL  AS  A  METHOD  OF  VOICE  INPUT 
VERSUS  MANUAL  KEYING  ?  WHY  OR  WHY  NOT  ? 

7.  DO  YOU  BELIEVE  THAT  VOICE  RECOGNITION  AS  AN  INPUT  TOOL 
FOR  A  TACTICAL  DATA  SYSTEM  WOULD  HELP  TO  INCREASE  THE 
COMBAT  REACTION  TIME  AVAILABLE  TO  THE  TACTICAL 
COMMANDER  ?  WHY  OR  WHY  NOT  ? 
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APPENDIX  B 


GRAMMAR  FILE  USED  FOR  THE  EXPERIMENT 


! TACTABLE_GRAM= 

#REC 

#G 

TARGET  .DIGIT  01,3  TACK  .DIGIT  @2  POINT  .DIGIT 
&  DESIGNATED  .DIGIT  @2 
TARGET  .DIGIT  02  DISPLAY  .TYPE 
TARGET  .DIGIT  @2  .IDENTIFY 
PLOT  RACKET  .DIGIT  @3 
PLOT  TARGET  .DIGIT  @2 
ASSIGN  TARGET  .DIGIT  @2  .WEAPON 
. DIGIT= 


0 

1 

2 

3 

4 

5 

6 

7 

8 

9 
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.  TYPE= 


SURFACE 

AIR 

. IDENTIFY* 
FRIENDLY 
HOSTILE 
NEUTRAL 

.WEAPON* 

MISSILE 

TORPEDO 

GUN 


#TR 

TARGET 

1040 

TACK 

- 

POINT 

• 

DESIGNATED 

DESIGNATED 

SURFACE 

SURFACE  |015 

AIR 

AIR  |015 

DISPLAY 

(040 

IDENTIFY 

1040 

FRIENDLY 

FRIENDLY  | 015 

HOSTILE 

HOSTILE  |015 

NEUTRAL 

NEUTRAL  |015 

PLOT 

1040 

RACKET 

RACKET 

MISSILE 

M|  015 

46 


TORPEDO 


T  |  015 


GUN 


G  |  015 
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